Chain-based Scheduling: Part I { Loop Transformations and Code Generation Chain-based Scheduling: Part I { Loop Transformations and Code Generation
نویسنده
چکیده
Chain-based scheduling 1] is an eecient partitioning and scheduling scheme for nested loops on distributed-memory multicomputers. The idea is to take advantage of the regular data dependence structure of a nested loop to overlap and pipeline the communication and computation. Most partitioning and scheduling algorithms proposed for nested loops on multicomputers 1,2,3] are graph algorithms on the iteration space of the nested loop. The graph algorithms for partitioning and scheduling are too expensive (at least O(N), where N is the total number of iterations) to be implemented in parallelizing compilers. Graph algorithms also need large data structures to store the result of the partitioning and scheduling. In this paper, we propose compiler loop transformations and the code generation to generate chain-based parallel codes for nested loops on multi-computers. The cost of the loop transformations is O(nd), where n is the number of nesting loops and d is the number of data dependences. Both n and d are very small in real programs. The loop transformations and code generation for chain-based partitioning and scheduling enable paralleliz-ing compilers to generate parallel codes which contain all partitioning and scheduling information that the parallel processors need at run time.
منابع مشابه
Chain-Based Scheduling: Part I { Loop Transformations and Code Generation
Chain-based scheduling [1] is an e cient partitioning and scheduling scheme for nested loops on distributed-memory multicomputers. The idea is to take advantage of the regular data dependence structure of a nested loop to overlap and pipeline the communication and computation. Most partitioning and scheduling algorithms proposed for nested loops on multicomputers [1,2,3] are graph algorithms on...
متن کاملMessage-passing code generation for non-rectangular tiling transformations
Tiling is a well known loop transformation used to reduce communication overhead in distributed memory machines. Although a lot of theoretical research has been done concerning the selection of proper tile shapes that reduce processor idle times, there is no complete approach to automatically parallelize non-rectangularly tiled iteration spaces and consequently there are no actual experimental ...
متن کاملUniversit at Passau Fakultt at F Ur Mathematik Und Informatik Automatic Code Generation in the Polytope Model Eidesstattliche Erkl Arung
Hiermit erkll are ich eidesstattlich, daa ich diese Diplomarbeit selbstt andig und ohne Benutzung anderer als der angegebenen Quellen und Hilfsmittel angefer-tigt habe, und alle Ausf uhrungen, die ww ortlich oder sinngemm aa ubernommen wurden, als solche gekennzeichnet sind. Die Diplomarbeit wurde in gleicher oder ahnlicher Form noch keiner anderen Pr ufungsbehh orde vorgelegt. Abstract In rece...
متن کاملUsing Performance Bounds to Guide Pre-scheduling Code Optimizations
We advocate using performance bounds to guide code optimizations. Accurate performance bounds establish an efficient way to evaluate benefits as well as overheads of code transformations without actually performing instruction scheduling. In this paper, we introduce a novel bound-guided approach to systematically regulate code size related instruction level parallelism (ILP) optimizations inclu...
متن کاملIntegrating Program Optimizations and Transformations with the Scheduling of Instruction Level Parallelism
Code optimizations and restructuring transformations are typically applied before scheduling to improve the quality of generated code. However, in some cases, the optimizations and transformations do not lead to a better schedule or may even adversely affect the schedule. In particular, optimizations for redundancy elimination and restructuring transformations for increasing parallelism axe oft...
متن کامل